The hexscape package has a number of uses (some of which are not yet implemented):
Facilitate use of vectorised spatial data for EU/EEA countries, and/or regions/areas (as defined by NUTS1/2/3) of these countries
Facilitate use of additional vectorised spatial data on parishes, kommune and postcodes for Denmark
Subdivision of the above spatial data into `patches’ based on either polygons of arbitrary size, or Voronoi tessellation around points of interest supplied by the user
Facilitate use of Corine land usage data for EU/EEA countries (or regions/areas thereof) and aggregation of different land cover types within the spatial areas and/or patches defined above
Facilitate generating networks of distances and/or adjacencies for these polygons
The data comes from a combination of Eurostat, Corine, and other sources - all of which is freely available online. Much of the internal functionality is provided by the sf package.
The package isn’t on CRAN (yet?), but the most recent stable (ish) version can be installed from our drat repository using:
install.packages("hexscape", repos=c(CRAN = "https://cran.rstudio.com/",
`ku-awdc` = "https://ku-awdc.github.io/drat/"))
Or, for the most recent (but even less stable) version you can install from GitHub:
# remotes::install_github("ku-awdc/hexscape")
Then you should be able to load the package:
library("hexscape")
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Note that this also currently loads tidyverse, as that is what I used to develop the package, and I was too lazy to import everything properly. At some point, this will change, so it is probably best to explicitly load tidyverse and sf (due to the heavy reliance on sf data frames) as well:
library("tidyverse")
library("sf")
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
(Unless of course you don’t use tidyverse, in which case you should probably read https://r4ds.had.co.nz/index.html a couple more times)
The hexscape package uses local caching of intermediate data files, as most of the spatial operations take quite a long time to process from the raw data. So, the first step is to create a folder somewhere on your hard drive (NOT a network storage drive) to/from which files can be saved/loaded:
path_to_folder <- "/some/path/to/a/folder"
Then run the following code:
set_storage_folder(path_to_folder)
You should now see that hexscape has created some subfolders:
list.files(path_to_folder)
## [1] "eurostat_cache" "landscapes" "processed_data" "raw_data"
This storage folder needs to be set every time you load hexscape. To avoid having to do this manually, you can put the following code in your .Rprofile:
## To open your .Rprofile try:
usethis::edit_r_profile()
## Then add the following to the bottom of that file:
Sys.setenv("HEXSCAPE_STORAGE"="/some/path/to/a/folder")
# [obviously replacing /some/path/to/a/folder with your path]
That way, the storage folder will be set for you automatically when hexscape is loaded.
The hexscape package makes heavy use of NUTS (Nomenclature of territorial units for statistics; 2021 version) to define spatial areas. These are defined at levels 0 (national) to 3 (smallest scale), with the number of NUTS units depending on the country. Some datasets are inbuilt to help with cross-referencing:
nuts_codes
## # A tibble: 2,010 × 5
## Code Country Level NUTS Label
## <chr> <chr> <int> <chr> <chr>
## 1 AL Albania 0 AL Shqipëria
## 2 AT Austria 0 AT Österreich
## 3 BE Belgium 0 BE Belgique/België
## 4 BG Bulgaria 0 BG България
## 5 CH Switzerland 0 CH Schweiz/Suisse/Svizzera
## 6 CY Cyprus 0 CY Κύπρος
## 7 CZ Czechia 0 CZ Česko
## 8 DE Germany 0 DE Deutschland
## 9 DK Denmark 0 DK Danmark
## 10 EE Estonia 0 EE Eesti
## # … with 2,000 more rows
nuts_codes |> filter(Code=="DK")
## # A tibble: 18 × 5
## Code Country Level NUTS Label
## <chr> <chr> <int> <chr> <chr>
## 1 DK Denmark 0 DK Danmark
## 2 DK Denmark 1 DK0 Danmark
## 3 DK Denmark 2 DK01 Hovedstaden
## 4 DK Denmark 2 DK02 Sjælland
## 5 DK Denmark 2 DK03 Syddanmark
## 6 DK Denmark 2 DK04 Midtjylland
## 7 DK Denmark 2 DK05 Nordjylland
## 8 DK Denmark 3 DK011 Byen København
## 9 DK Denmark 3 DK012 Københavns omegn
## 10 DK Denmark 3 DK013 Nordsjælland
## 11 DK Denmark 3 DK014 Bornholm
## 12 DK Denmark 3 DK021 Østsjælland
## 13 DK Denmark 3 DK022 Vest- og Sydsjælland
## 14 DK Denmark 3 DK031 Fyn
## 15 DK Denmark 3 DK032 Sydjylland
## 16 DK Denmark 3 DK041 Vestjylland
## 17 DK Denmark 3 DK042 Østjylland
## 18 DK Denmark 3 DK050 Nordjylland
There is also a list of countries indicating whether or not that country has Eurostat spatial information available (and can therefore be used with the hexscape package):
country_codes
## # A tibble: 42 × 6
## Code Country Eurostat Native French German
## <chr> <chr> <lgl> <chr> <chr> <chr>
## 1 AL Albania TRUE Shqipëria Albanie Alban…
## 2 AT Austria TRUE Österreich Autriche Öster…
## 3 BA Bosnia and Herzegovina FALSE Bosna i Hercegovina Bosnie … Bosni…
## 4 BE Belgium TRUE Belgique/België Belgique Belgi…
## 5 BG Bulgaria TRUE Bulgarija Bulgarie Bulga…
## 6 CH Switzerland TRUE Schweiz/Suisse/Svizzera Suisse Schwe…
## 7 CY Cyprus TRUE Kýpros Chypre Zypern
## 8 CZ Czechia TRUE Česko Tchéquie Tsche…
## 9 DE Germany TRUE Deutschland Allemag… Deuts…
## 10 DK Denmark TRUE Danmark Danemark Dänem…
## # … with 32 more rows
We use the spatial data provided by Eurostat to define the boundaries of NUTS areas. This can be downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts#nuts21
You need to select the following options:
Then hit the download button and extract the resulting ZIP file. You should end up with a 29.2 MB folder called “NUTS_RG_01M_2021_3035.shp”. Put this folder inside the “raw_data” folder of the hexscape storage folder you created above.
You should then be able to load a map for any NUTS area at any level, for example:
dk <- load_map("DK")
dk
## Simple feature collection with 1 feature and 5 fields
## Active geometry column: geometry
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 4199561 ymin: 3496605 xmax: 4650503 ymax: 3850132
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 1 × 7
## Code Country NUTS Level Label centroid
## * <chr> <chr> <chr> <int> <chr> <POINT [m]>
## 1 DK Denmark DK 0 Danmark (4278397 3656722)
## # … with 1 more variable: geometry <MULTIPOLYGON [m]>
ggplot(dk) + geom_sf() + theme_void()
Or for a sub-region of a country:
sj <- load_map("DK032")
ggplot(sj) + geom_sf() + theme_void()
The first time the maps for a specific country are fetched might take a couple of seconds to process from the raw data; subsequent loads will be from a country-specific cache (almost instantaneously).
The hexscape package uses Corine Land Cover (CLC) data as the raw underlying data for land usage. To use this, you need to download the SQLite Database from here: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018?tab=download
You will first need to register an account and log in, but it is free to do so.
Once the file has downloaded (beware: it is 3.5GB) you can un-zip it and hopefully end up with a folder named u2018_clc2018_v2020_20u1_geoPackage - move (or copy/symlink if you prefer) this folder inside the raw_data folder that you set up above. Make sure to keep the name of the folder exactly as it is above.
The CLC data classifies land usage under a number of different categories, grouped at 3 different levels. To see what codes are defined use the following lookup table:
clc_codes
## # A tibble: 44 × 5
## CLC CLC_Label1 CLC_Label2 CLC_L…¹ CLC_RGB
## <chr> <chr> <chr> <chr> <chr>
## 1 111 Artificial surfaces Urban fabric Contin… 230-00…
## 2 112 Artificial surfaces Urban fabric Discon… 255-00…
## 3 121 Artificial surfaces Industrial, commercial and transpo… Indust… 204-07…
## 4 122 Artificial surfaces Industrial, commercial and transpo… Road a… 204-00…
## 5 123 Artificial surfaces Industrial, commercial and transpo… Port a… 230-20…
## 6 124 Artificial surfaces Industrial, commercial and transpo… Airpor… 230-20…
## 7 131 Artificial surfaces Mine, dump and construction sites Minera… 166-00…
## 8 132 Artificial surfaces Mine, dump and construction sites Dump s… 166-07…
## 9 133 Artificial surfaces Mine, dump and construction sites Constr… 255-07…
## 10 141 Artificial surfaces Artificial, non-agricultural veget… Green … 255-16…
## # … with 34 more rows, and abbreviated variable name ¹CLC_Label3
The final column defines a suitable colour code suggested for plotting.
The raw Corine data contains data on all polygons for all countries (i.e. it is vector rather than raster). In order to use it, we need to process the raw data to intersect with the specific NUTS1 area(s) we are interested in, and simplify the polygons slightly (for easier processing and plotting downstream). This is done automatically the first time you request a map, then the data is cached and re-used for subsequent requests of the same NUTS1 area. Simplification of polygons from different CLC codes is done ensuring that the total area remains the same, i.e. none of the simplified CLC polygons overlap, and the returned result is still a valid sf object. This all takes some time to process… For example, caching data for Denmark (NUTS1 code DK0) takes around 3 mins to run on my arm64 laptop (and more than double that on my ageing Xeon desktop):
dk_corine <- load_corine("DK0")
But subsequent requests for any data from within Denmark is instantaneous, for example:
dk032_corine <- load_corine("DK032")
## Returning corine data (by CLC and NUTS3 area)
ggplot(dk032_corine, aes(fill=CLC)) + geom_sf(lwd=0, colour=NA) + theme_void() + theme(legend.pos="none") + scale_fill_manual(values=clc_codes$CLC_RGB)
## TODO: implement an autoplot method
The default return value is a modified sf data frame, with 1 row per combination of CLC and NUTS3 area code in the region requested, and columns giving the CLC code and descriptions, country code and name, NUTS3 code, total area of the CLC type in the original data (in km2), total area of the CLC type in the simplified data, and the geometry. If you prefer a single row per CLC type for the entire area, then you can specify union=TRUE to load_corine().
The number of NUTS1 codes per country varies from 1 (around half of the countries) up to 16 (Germany), and each NUTS1 area is cached separately. If you request data for a whole country, this might mean it takes a long time to process all of the relevant NUTS1 areas.
If you know that you need to process a lot of different NUTS1 areas then you can set use_cache=TRUE for load_corine(); this does some pre-processing of the entire raw Corine database which takes around 5 minutes the first time, but speeds up subsequent calls to load_corine (for any NUTS1 area). The downside is that this uses around 5GB of extra hard drive space to store the cached intermediate files. Or you can ask me nicely for a specific set NUTS1 areas and I will send the pre-processed cache files to you … but please do register and download the Corine database yourself so that I am not violating their usage policies.
The load_corine() function provides spatial data for every CLC code separately, but you may want to filter and/or combine them. For now you have to do that manually - for example if we want to only extract farmland:
dk032_corine |>
filter(CLC_Label1 == "Agricultural areas") |>
summarise(Area = sum(Area), geometry = st_union(geometry)) ->
dk032_farmland
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green") +
theme_void()
TODO: At some point I will probably write a helper function e.g. aggregate_corine() for this…
TODO: incorporate code for kommune, sogne, and postcode shape files from Covid19tools package
Future TODO: maybe include other info (e.g. population densities) from Danmark’s Statistic via their API (or using https://github.com/rOpenGov/dkstat ??)
One option to discretise the spatial data is using Voronoi tesselation around a given number of spatial points. In real life this could be farm locations; let’s simulate N=100 farms within the farmland of south Jutland:
N <- 100L
tibble(Index = 1:N, point=st_sample(dk032_farmland, N)) |>
st_as_sf(sf_column_name="point") ->
fake_farms
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=fake_farms, col="dark green") +
theme_void()
Then we can produce Voronoi tesselated polygons within this farmland as follows (where the first argument is the spatial mask for the area of interest, and the second is the points defining the Voronoi points):
dk032_voronoi <- discretise_voronoi(dk032_farmland, fake_farms)
This returns the original data frame, plus new columns for Area, centroid (of the Voronoi tesselated area after intersecting with the map), and geometry.
dk032_voronoi
## Simple feature collection with 100 features and 2 fields
## Active geometry column: geometry
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 4204284 ymin: 3521791 xmax: 4325551 ymax: 3650354
## Projected CRS: ETRS89-extended / LAEA Europe
## # A tibble: 100 × 5
## Index point Area centroid
## <int> <POINT [m]> <dbl> <POINT [m]>
## 1 1 (4235922 3548987) 83.8 (4239183 3548777)
## 2 2 (4240705 3592928) 59.1 (4239881 3589245)
## 3 3 (4261709 3600338) 34.6 (4264210 3600988)
## 4 4 (4267589 3613024) 66.7 (4266332 3614551)
## 5 5 (4274670 3630526) 53.6 (4277342 3629021)
## 6 6 (4210873 3613283) 44.1 (4212186 3615111)
## 7 7 (4239227 3557662) 106. (4241967 3556707)
## 8 8 (4229331 3595601) 65.8 (4228001 3600445)
## 9 9 (4272113 3631163) 70.3 (4268663 3631170)
## 10 10 (4241146 3595963) 11.2 (4240739 3596029)
## # … with 90 more rows, and 1 more variable: geometry <GEOMETRY [m]>
These can be plotted to show the expected discrepancy between original farm location and centroid e.g.:
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
geom_sf(data=dk032_voronoi, aes(geometry=centroid), col="dark green") +
geom_sf(data=fake_farms, col="red") +
theme_void()
TODO
TODO
TODO
TODO
A number of helper/utility functions are provided (mostly wrapping functionality in sf), some of which are described below.
To easily sample random points within a data frame containing polygons (either Voronoi tesselations or hexagons or anything else), you can use the sample_points function. This takes 2 arguments: the first is an sf data frame with 1 or more rows and a column giving the Index, and the second is the number of points to sample within each row.
random_points <- sample_points(dk032_voronoi, size=5L, verbose=0L)
random_points
## Simple feature collection with 500 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 4209388 ymin: 3526626 xmax: 4321513 ymax: 3646143
## Projected CRS: ETRS89-extended / LAEA Europe
## First 10 features:
## Index geometry
## 1 1 POINT (4235621 3549235)
## 2 1 POINT (4235041 3551647)
## 3 1 POINT (4245364 3549296)
## 4 1 POINT (4236230 3544525)
## 5 1 POINT (4237313 3544502)
## 6 2 POINT (4237018 3589317)
## 7 2 POINT (4239699 3584593)
## 8 2 POINT (4239536 3584942)
## 9 2 POINT (4239103 3591423)
## 10 2 POINT (4240638 3587916)
The points are generated by rejection sampling over a bounding box that iteratively decreases in size once enough points have been found for each individual polygon. To plot the points:
ggplot() +
geom_sf(data=load_map("DK032")) +
geom_sf(data=dk032_farmland, fill="dark green", alpha=0.5) +
geom_sf(data=dk032_voronoi, fill="transparent", col="red") +
geom_sf(data=random_points, col="black", size=0.3) +
theme_void()
In this case, each farmland polygon should contain exactly 5 random points.
The package is still under active development and is subject to some change, however we will try to keep the interfaces discussed above consistent. In particular, we acknowledge that the help files are fairly non-existent at the moment - sorry - but these will be updated ASAP. When new release versions are uploaded to drat, this guide will be updated with the new features.
At some point soon ish we will be summarising the main features of this package as part of a paper (linked to both Mossa’s PhD and DigiVet). In the meantime, comments and suggestions are very welcome!
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sf_1.0-9 hexscape_0.5.0-1 forcats_1.0.0 stringr_1.5.0
## [5] dplyr_1.1.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
## [9] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 pbapply_1.7-0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.0 lubridate_1.9.1 httr_1.4.4
## [4] rprojroot_2.0.3 tools_4.2.2 backports_1.4.1
## [7] bslib_0.4.2 utf8_1.2.2 R6_2.5.1
## [10] KernSmooth_2.23-20 DBI_1.1.3 colorspace_2.1-0
## [13] withr_2.5.0 tidyselect_1.2.0 curl_5.0.0
## [16] compiler_4.2.2 cli_3.6.0 rvest_1.0.3
## [19] xml2_1.3.3 stringfish_0.15.7 sass_0.4.5
## [22] scales_1.2.1 rmapshaper_0.4.6 classInt_0.4-8
## [25] proxy_0.4-27 digest_0.6.31 rmarkdown_2.20
## [28] pkgconfig_2.0.3 htmltools_0.5.4 bibtex_0.5.1
## [31] highr_0.10 dbplyr_2.3.0 fastmap_1.1.0
## [34] jsonvalidate_1.3.2 rlang_1.0.6 readxl_1.4.1
## [37] rstudioapi_0.14 httpcode_0.3.0 farver_2.1.1
## [40] jquerylib_0.1.4 generics_0.1.3 RApiSerialize_0.1.2
## [43] jsonlite_1.8.4 googlesheets4_1.0.1 magrittr_2.0.3
## [46] Rcpp_1.0.10 munsell_0.5.0 fansi_1.0.4
## [49] RefManageR_1.4.0 lifecycle_1.0.3 stringi_1.7.12
## [52] yaml_2.3.7 plyr_1.8.8 grid_4.2.2
## [55] parallel_4.2.2 regions_0.1.8 crayon_1.5.2
## [58] lattice_0.20-45 haven_2.5.1 hms_1.1.2
## [61] knitr_1.42 pillar_1.8.1 geojsonlint_0.4.0
## [64] crul_1.3 reprex_2.0.2 glue_1.6.2
## [67] evaluate_0.20 V8_4.2.2 RcppParallel_5.1.6
## [70] modelr_0.1.10 runjags_2.2.2-1 vctrs_0.5.2
## [73] tzdb_0.3.0 cellranger_1.1.0 gtable_0.3.1
## [76] qs_0.25.4 assertthat_0.2.1 cachem_1.0.6
## [79] xfun_0.36 eurostat_3.7.10 broom_1.0.3
## [82] countrycode_1.4.0 e1071_1.7-12 coda_0.19-4
## [85] class_7.3-21 googledrive_2.0.0 gargle_1.2.1
## [88] units_0.8-1 timechange_0.2.0 ellipsis_0.3.2
## [91] here_1.0.1